fix: Improve wizard.sh and Parakeet/Deepgram worker #297
fix: Improve wizard.sh and Parakeet/Deepgram worker #2970xrushi wants to merge 17 commits intoSimpleOpenSoftware:devfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis pull request introduces a specialized TranscriptAnnotation model, refactors the annotation API from multi-type support to user-correction focused endpoints, replaces the cron scheduler with a modular task runner using Redis, adds new audio stream workers (Deepgram and Parakeet), and significantly refactors service management and setup orchestration with improved typing and modularity. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Client as Client/API
participant API as Annotation<br/>Endpoint
participant DB as Database<br/>(MongoDB)
participant Queue as Job Queue<br/>(RQ)
participant Worker as Memory<br/>Worker
User->>Client: POST /api/annotations/<br/>(conversation_id, segment_index,<br/>original_text, corrected_text)
Client->>API: create_annotation()
API->>DB: Conversation.find_one(id, user)
DB-->>API: conversation
API->>DB: TranscriptAnnotation<br/>(create record)
DB-->>API: annotation inserted
API->>DB: Update active segment<br/>with corrected_text
API->>DB: Conversation.save()
DB-->>API: saved
API->>Queue: enqueue_memory_processing<br/>(conversation_id, user_id)
Queue-->>API: job_id
API-->>Client: 200 AnnotationResponse
Client-->>User: Success
Queue->>Worker: Process memory
Worker->>DB: Fetch conversation
Worker->>DB: Extract & deduplicate<br/>memories
Worker->>DB: Store memory version
Worker-->>Queue: Complete
sequenceDiagram
participant Scheduler as Cron<br/>Scheduler
participant Redis as Redis
participant DB as Database
participant Worker as Job<br/>Worker
Note over Scheduler: run_scheduler() loop<br/>every 5 seconds
alt Daily interval reached
Scheduler->>DB: Fetch recent conversations<br/>(last 24h)
DB-->>Scheduler: conversations
Scheduler->>DB: Check for existing<br/>annotations
alt 30% probability & no duplicates
Scheduler->>DB: Create TranscriptAnnotation<br/>(MODEL_SUGGESTION, PENDING)
DB-->>Scheduler: inserted
end
Note over Scheduler: Limit: 6 suggestions
end
alt Weekly interval reached
Scheduler->>DB: Fetch accepted<br/>TranscriptAnnotations
DB-->>Scheduler: annotations
Note over Scheduler: Build training dataset<br/>(original→corrected pairs)
Scheduler->>Worker: Mock fine-tune<br/>operation (2s)
Worker-->>Scheduler: Complete
end
Scheduler->>Redis: Sleep 5s
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Introduced a new `TranscriptAnnotation` model for managing transcript corrections. - Added `annotation_routes` for creating and retrieving annotations via API. - Implemented cron jobs to surface potential errors in transcripts and finetune a model based on accepted corrections. - Updated Docker Compose to include a cron service for running scheduled tasks. - Enhanced the web UI to support displaying and managing annotations, including accepting and rejecting suggestions. - Added tests for annotation model and integration flow to ensure functionality and reliability.
- Introduced a base class `BaseStreamWorker` to standardize audio stream worker implementations, improving code reuse and maintainability. - Updated `DeepgramStreamWorker` and `ParakeetStreamWorker` to inherit from `BaseStreamWorker`, streamlining their initialization and configuration validation processes. - Enhanced error handling and logging for Redis connections and consumer initialization in the worker classes. - Refactored memory processing logic to improve conversation text extraction and speaker identification. - Added unit tests for audio stream workers to ensure reliability and correct functionality across various scenarios.
- Introduced comprehensive unit tests for the `TranscriptAnnotation` model, validating default values and enum functionality. - Added a test suite for conversation models, including `Conversation`, `TranscriptVersion`, and `MemoryVersion`, ensuring proper initialization and behavior. - Implemented unit tests for the `ObsidianService`, covering search functionality, database setup, and error handling. - Created a new directory for audio stream worker tests, establishing a foundation for testing audio processing logic. - Enhanced overall test coverage to improve reliability and maintainability of the codebase.
- Introduced a new test suite for the annotation flow, validating the interaction between the `TranscriptAnnotation` model and conversation data. - Implemented asynchronous tests using `pytest` and `httpx` to ensure proper API functionality and database interactions. - Mocked dependencies to simulate database operations and validate the correctness of annotation creation and memory processing. - Enhanced test coverage for the annotation feature, contributing to overall reliability and maintainability of the codebase.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Okay, well, the test-with-api-key thing didn't work. Fuck that. |
|
I'll try to debug the mock tests on top of this branch and make a commit |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (2)
313-356:⚠️ Potential issue | 🟠 Major
update_memorystill writesupdated_atto payload, but it's never read back.
add_memoriesno longer storesupdated_at(line 132), andsearch_memories/get_memoriesno longer reconstruct it into theMemoryEntry(lines 195, 235). However,update_memorystill writes"updated_at"at line 325. This means the field is persisted in Qdrant for updated memories but silently dropped on every subsequent retrieval—theMemoryEntry.updated_atwill always beNone(or default tocreated_atvia__post_init__).Either remove
updated_atfromupdate_memoryas well, or restore reading it insearch_memories/get_memoriesso the value round-trips correctly.Option A: Remove updated_at from update_memory to match the rest
try: payload = { "content": new_content, "metadata": new_metadata, - "updated_at": str(int(time.time())), }Option B: Restore reading updated_at in search/get to preserve the value
In
search_memories:created_at=result.payload.get("created_at"), + updated_at=result.payload.get("updated_at"),In
get_memories:created_at=point.payload.get("created_at"), + updated_at=point.payload.get("updated_at"),
383-387:⚠️ Potential issue | 🔴 CriticalCritical runtime error: Callers in
chronicle.pywill fail withAttributeError.The
get_memorymethod was removed from the vector store class, but it's still being called inbackends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.pyat lines 332 and 369 viaself.vector_store.get_memory(). This will causeAttributeErrorat runtime when attempting to retrieve or update memories. Either restore theget_memorymethod to the vector store class or replace those calls with an alternative retrieval method.
🤖 Fix all issues with AI agents
In `@backends/advanced/src/advanced_omi_backend/app_factory.py`:
- Around line 111-119: The import and Beanie model list incorrectly reference a
non-existent Annotation class; remove Annotation from the import statement that
currently reads "from advanced_omi_backend.models.annotation import Annotation,
TranscriptAnnotation" and also remove Annotation from the document_models list
passed to init_beanie (which currently includes User, Conversation,
AudioChunkDocument, WaveformData, Annotation, TranscriptAnnotation), leaving
only the actual model classes (e.g., TranscriptAnnotation, AudioChunkDocument,
WaveformData, Conversation, User) so the app no longer raises ImportError on
startup.
In `@backends/advanced/src/advanced_omi_backend/cron.py`:
- Around line 34-38: Initialize last_suggestion_run and last_training_run as
timezone-aware datetimes (e.g., set via
datetime.min.replace(tzinfo=timezone.utc)) or None and add a brief comment if
the immediate first-run behavior for surface_error_suggestions and
finetune_hallucination_model is intentional; replace datetime.utcnow() with
datetime.now(timezone.utc) throughout to use timezone-aware "now" and ensure any
subtraction or comparisons between now and last_* use matching tz-aware values
to avoid TypeError.
In `@backends/advanced/src/advanced_omi_backend/models/annotation.py`:
- Line 20: TranscriptAnnotation currently overrides id as a plain str (id: str =
Field(default_factory=lambda: str(uuid.uuid4()))) which diverges from Beanie
conventions; update TranscriptAnnotation.id to use Beanie's PydanticObjectId
type and default to an ObjectId (import PydanticObjectId from beanie and change
the field declaration on the TranscriptAnnotation class), or alternatively
remove the custom id so Beanie provides the default _id, and if you need a UUID
string keep it as a separate field like transcript_uuid; update imports to
include PydanticObjectId and adjust any code that assumes id is a str to use its
string representation when needed.
In `@backends/advanced/src/advanced_omi_backend/routers/api_router.py`:
- Line 27: Remove the duplicate import and double-mounted router: delete the
second import of annotation_router so it is imported only once, and remove one
of the two include_router(...) calls that registers annotation_router (either
the one that uses the prefix inside annotation_routes.py or the explicit
include_router(annotation_router, prefix="/annotations") call); prefer keeping
the explicit include_router(annotation_router, prefix="/annotations") mounting
and ensure annotation_routes.py’s internal prefix is not also applied to avoid
registering endpoints twice.
In `@backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py`:
- Line 36: Remove the duplicate import and duplicate __all__ entry for
annotation_router: keep the original import of annotation_router (the one
already present) and delete the second "from .annotation_routes import router as
annotation_router" and the repeated "annotation_router" string in the __all__
list so annotation_router is only imported and exported once.
In
`@backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py`:
- Around line 94-115: The GET handler get_annotations currently only filters
TranscriptAnnotation by user_id but doesn't verify that the conversation itself
belongs to current_user; update get_annotations to first load the Conversation
(or equivalent model) by conversation_id and check its owner against
current_user.id (or raise HTTPException 404/403), mirroring the POST endpoint's
ownership check, and only then query TranscriptAnnotation.find(...) for
annotations to build AnnotationResponse objects; reference get_annotations,
current_active_user, TranscriptAnnotation, AnnotationResponse and the
Conversation model/name used in the codebase when implementing the check.
- Around line 56-82: The annotation is being persisted before validating the
conversation's active transcript and segment index, which can leave orphaned
annotations; move the validation and transcript update logic to run before
calling new_annotation.insert(): first check conversation.active_transcript
exists and that 0 <= annotation.segment_index < len(version.segments), update
version.segments[annotation.segment_index].text and replace the entry in
conversation.transcript_versions and call await conversation.save(), then call
await new_annotation.insert(), and finally call enqueue_memory_processing(...).
Keep the existing symbols (new_annotation.insert(),
conversation.active_transcript, annotation.segment_index,
conversation.transcript_versions, conversation.save(),
enqueue_memory_processing) and raise the same HTTPExceptions on validation
failure.
In `@backends/advanced/src/advanced_omi_backend/workers/annotation_jobs.py`:
- Around line 20-24: Replace the naive UTC call with an aware UTC datetime:
compute "since" using datetime.now(timezone.utc) minus timedelta to produce an
aware datetime (matching TranscriptAnnotation timestamps) before passing it into
Conversation.find; update the import if needed (ensure timezone is imported) and
reference the existing "since" variable in annotation_jobs.py where
Conversation.find({"created_at": {"$gte": since}}) is invoked so the query
compares two timezone-aware datetimes.
In `@backends/advanced/src/advanced_omi_backend/workers/base_audio_worker.py`:
- Around line 126-138: The shutdown can hang because awaiting consumer_task has
no timeout and stop_wait_task is never cancelled; modify the shutdown sequence
in the method that calls self.consumer.stop() so you await consumer_task with a
bounded timeout (e.g., use asyncio.wait_for on consumer_task) and handle
asyncio.TimeoutError by cancelling consumer_task and logging; also ensure
stop_wait_task is cancelled if consumer_task completes first (call
stop_wait_task.cancel() and await it safely), and keep existing exception
handling for CancelledError and other exceptions when awaiting consumer_task and
stop_wait_task; reference self.consumer.stop(), consumer_task,
start_consuming(), and stop_wait_task when applying these changes.
In `@backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py`:
- Around line 285-300: The try/except around enqueuing the summary job in
enqueue_memory_processing currently logs and re-raises the error which causes
the caller to never receive the already-enqueued primary memory job (job).
Change the except block so it logs the full exception (include e) and continues
without raising (so the function can return job); if you must re-raise in other
contexts, use bare "raise" to preserve traceback; update references:
enqueue_memory_processing, summary_job_id, generate_title_summary_job, job, and
logger.
- Around line 138-178: The NameError occurs because processing_time is only set
inside the if conversation_model: block; compute processing_time before
fetching/refetching the Conversation so it's always defined (e.g., right after
the success and created_memory_ids check and before calling
Conversation.find_one). Keep the existing usage of processing_time in
conversation_model.add_memory_version and logger.info unchanged; this ensures
processing_time is available whether conversation_model is None or not while
preserving the version creation logic in add_memory_version, and reference the
existing symbols conversation_model, Conversation.find_one, start_time,
processing_time, created_memory_ids, memory_service, and logger.info when making
the change.
In `@backends/advanced/tests/unit/workers/test_audio_stream_workers.py`:
- Around line 240-253: The test is calling instance.run() instead of testing the
classmethod start(); update
TestParakeetStreamWorker.test_start_method_runs_worker to invoke
ParakeetStreamWorker.start() (or await ParakeetStreamWorker.start() if async)
rather than creating an instance and awaiting run(), keep the existing patches
for ParakeetStreamWorker.run (AsyncMock) and asyncio.run so the classmethod
triggers the patched run and then assert mock_run.assert_called_once(); this
replaces the worker_instance = ParakeetStreamWorker(); await
worker_instance.run() sequence with a call to the start() entrypoint.
- Around line 175-188: The test test_start_method_runs_worker is misnamed and
doesn't exercise DeepgramStreamWorker.start; change it to call
DeepgramStreamWorker.start() synchronously (instead of awaiting
worker_instance.run()) and assert that asyncio.run was invoked with the worker's
run() coroutine; keep the AsyncMock patch on DeepgramStreamWorker.run to prevent
real execution and verify mock run was scheduled via asyncio.run, and remove the
unused direct await of run().
In `@services.py`:
- Line 274: The string passed to console.print in the failure case uses an
unnecessary f-string prefix; remove the leading f from the literal in the
console.print call so it becomes a normal string (i.e., change the console.print
call that currently reads console.print(f"[red]❌ Command failed[/red]") to use a
plain string), keeping the same message and formatting; this fixes the Ruff F541
warning in the function containing that console.print.
- Around line 81-95: _get_service_path currently assumes service_name exists in
SERVICES and will raise KeyError; add an explicit membership check for
service_name in SERVICES at the start of _get_service_path, and if missing call
console.print with a clear error (e.g., "Service not found: {service_name}") and
return None so callers like run_compose_command get a clean failure instead of
an exception; keep the existing checks for service_path.exists() and
compose_file.exists() (service_path and compose_file symbols) unchanged.
- Around line 466-470: When args.all is set we currently build
services_to_process by filtering SERVICES via check_service_configured, which
causes stop --all to skip services missing a .env; change the logic so that for
the stop action we do not filter by check_service_configured: detect the
command/action (e.g., stop vs start/restart) and when action == "stop" populate
services_to_process = list(SERVICES.keys()) (or remove the
check_service_configured filter), while keeping the existing check for
start/restart so start still requires a configured service; update references to
services_to_process and args.all accordingly.
In `@wizard.py`:
- Line 551: Remove the unnecessary f-string prefixes on the console.print calls
that don't use any interpolations; specifically update the console.print
invocation(s) in wizard.py (e.g., the call that prints "🎊 [bold green]Setup
Complete![/bold green]" and the other similar console.print at line ~560) to use
plain string literals instead of f-strings (remove the leading 'f' from those
string literals).
🧹 Nitpick comments (19)
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)
245-270: Redundantimport uuid— already imported at module level (line 11).Both
delete_memory(line 249) andupdate_memory(line 330) re-importuuidinside the function body even though it's already imported at the top of the file.Remove the in-function imports
async def delete_memory(self, memory_id: str, user_id: Optional[str] = None, user_email: Optional[str] = None) -> bool: """Delete a specific memory from Qdrant.""" try: # Convert memory_id to proper format for Qdrant - import uuid try:And similarly in
update_memory:# Convert memory_id to proper format for Qdrant # Qdrant accepts either UUID strings or unsigned integers - import uuid try:backends/advanced/src/advanced_omi_backend/workers/base_audio_worker.py (1)
17-22: Module-level logging side effect may interfere with application logging config.Calling
logging.basicConfig()at module import time can silently override or conflict with the application's logging configuration established elsewhere (e.g., incron.pyLine 12-16). The guardif not logging.getLogger().handlershelps, but import order becomes significant.Consider removing this and relying on the application entry point to configure logging.
backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py (1)
152-160: Silentexcept: passswallows errors in provider detection.The bare
except Exception: passon Lines 159-160 hides any bugs in the provider detection logic. At minimum, log at debug level.Proposed fix
except Exception: - pass + logger.debug("Could not determine memory provider, using default", exc_info=True)backends/advanced/tests/unit/workers/test_audio_stream_workers.py (1)
256-341: Integration tests are solid but don't test actual signal handling.The "handles shutdown signal" tests (lines 261, 292) verify natural consumer completion and cleanup, which is valuable. However, they don't actually send a signal (e.g., via
os.kill) to exercise the signal handler path. Consider renaming to reflect what's actually tested (e.g.,test_deepgram_worker_handles_natural_completion), or adding a test that sets thestop_eventmid-run.backends/advanced/tests/unit/test_annotation_flow.py (3)
28-31: Unusedclientfixture.This fixture is defined but never used — the test creates its own
AsyncClientwithASGITransporton line 83. Remove it to avoid confusion.Proposed fix
-@pytest.fixture -async def client(app): - async with AsyncClient(app=app, base_url="http://test") as c: - yield c -
57-57: Fragile mock pattern forfind_one.
AsyncMock(return_value=mock_conv)()immediately invokes the mock to produce a coroutine object. This works once, but iffind_onewere called a second time in the code under test, it would fail because the coroutine is already exhausted. A cleaner approach:MockConversation.find_one = AsyncMock(return_value=mock_conv)This way each call to
find_one(...)returns a fresh awaitable resolving tomock_conv.
106-108: Assertion oncall_kwargsmay fail if the route passes positional args.If
enqueue_memory_processingis called with positional arguments rather than keyword arguments,mock_enqueue.call_args.kwargswill be empty and these assertions will fail with aKeyError. Consider usingcall_args[0]orassert_called_once_with(...)for robustness.backends/advanced/tests/unit/test_annotation_models.py (1)
8-10: Consider extractinginitialize_beanie()into a session-scoped fixture.
initialize_beanie()is called at the start of every test method, re-initializing the mock database each time. A@pytest.fixture(autouse=True)or session-scoped fixture would reduce duplication and make the test setup clearer.backends/advanced/src/advanced_omi_backend/models/annotation.py (2)
30-31:updated_atis never automatically updated on save.The field gets its default at creation time but won't be refreshed on subsequent saves. It will always equal
created_atunless callers remember to set it manually, which makes it misleading.Consider adding a Beanie
before_eventhook or asave()override:Proposed fix
+ from beanie import before_event, Replace, Update + + `@before_event`(Replace, Update) + def update_timestamp(self): + self.updated_at = datetime.now(timezone.utc)
2-2: Unused imports:OptionalandList.Neither
OptionalnorListis used in this file.backends/advanced/src/advanced_omi_backend/routers/modules/annotation_routes.py (1)
46-53: No validation thatoriginal_textmatches the actual segment text.The caller-supplied
original_textis stored as-is without verifying it matches the current segment content. If the segment was already modified (e.g., by a prior annotation), this could create a misleading audit trail. Consider comparing againstversion.segments[annotation.segment_index].textand rejecting mismatches.services.py (2)
135-166:_get_speaker_recognition_cmd_argsconflates profile flags and command args.For the
"up"path the returned list mixes profile args (--profile, compute_mode) with command args (up,-d, service names). This means_construct_docker_cmdat line 196 appends them directly to["docker", "compose"], producing a correct command only because_build_base_cmdis intentionally skipped. This implicit contract is fragile — if a new command (e.g."logs") is added later, the empty-list fallback silently producesdocker composewith no subcommand.Consider adding a brief docstring or inline comment clarifying this dual-return contract so future maintainers don't break the invariant.
337-348: Misleading indentation on multi-line string arguments.Lines 343 and 347 align the string argument at the same indentation level as
console.print(, making it look like a top-level statement rather than a continuation. This is syntactically valid but hurts readability.Proposed fix
if recreate: console.print( - "[dim]Using down + up to recreate containers (fixes WSL2 bind mount issues)[/dim]\n" + "[dim]Using down + up to recreate containers (fixes WSL2 bind mount issues)[/dim]\n" ) else: console.print( - "[dim]Quick restart (use --recreate to fix bind mount issues)[/dim]\n" + "[dim]Quick restart (use --recreate to fix bind mount issues)[/dim]\n" )wizard.py (6)
104-126: Hardcoded service name list incheck_service_existsis fragile.Line 113 explicitly checks
service_name in ["advanced", "speaker-recognition", "asr-services"]to decide whether to look forinit.pyvssetup.sh. If a new service is added toSERVICESthat also usesinit.py, this list must be manually updated.Consider deriving the expected script from the service's
cmdlist (e.g., check if the last element is"init.py") instead of maintaining a separate allowlist.Proposed fix
- # For services with Python init scripts - if service_name in ["advanced", "speaker-recognition", "asr-services"]: - script_path = service_path / "init.py" + # Derive expected script from the command definition + cmd = service_config.get("cmd", []) + script_name = cmd[-1] if cmd else None + if script_name and script_name.endswith(".py"): + script_path = service_path / script_name if not script_path.exists(): return False, f"Script {script_path} does not exist" else: - # For other extras (shell scripts) - script_path = service_path / "setup.sh" + script_name = script_name or "setup.sh" + script_path = service_path / script_name if not script_path.exists():
187-223: Sensitive tokens passed as command-line arguments are visible in process listings.
--hf-token(line 200),--deepgram-api-key(line 209), and similarly--neo4j-password(line 182 in_configure_advanced_backend) and--openai-api-key(line 247) are all appended to the subprocess command line. On multi-user systems, these are visible viaps aux. For a local dev setup wizard this is common practice, but worth noting for future hardening — environment variables or temp files would be more secure.
254-308: BareExceptioncatch at line 306 is acceptable here but could mask unexpected errors.In a setup wizard CLI, a catch-all for unexpected subprocess failures is pragmatic. The static analysis tool flags it (BLE001), and while it's fine for now, logging the exception type alongside the message would aid debugging.
Proposed minor improvement
- except Exception as e: - console.print(f"❌ {service_name} setup failed (unexpected): {e}") + except Exception as e: + console.print(f"❌ {service_name} setup failed (unexpected {type(e).__name__}): {e}")
418-428:setup_obsidianpassword loop has no upper-bound on retries.The
while Trueloop at line 418 will loop indefinitely if the user keeps entering short passwords without providing a valid one (and doesn't triggerEOFError). This is fine for an interactive wizard but consider adding a max-attempts guard for robustness.
447-477:pip install pre-commitmodifies the user's environment without explicit consent.Line 452 silently runs
pip install pre-commit. In most setups this is expected, but if the user is in a managed or system Python environment, this could be surprising. A brief console message before the install (e.g., "Installing pre-commit…") would improve transparency. Thecheck=FalseandDEVNULLsuppression mean failures are silent, which is fine for an optional step.
538-549: Config file re-read at end ofmainduplicates logic already known fromsetup_obsidian.Lines 538–549 re-read
config.ymlto check if Obsidian is enabled, butobsidian_enabled(fromsetup_obsidianat line 517) already captures the user's intent. The only additional case this covers is if the template already had obsidian enabled before the wizard ran. If that's intentional, a brief comment explaining why would help. Otherwise, this block can be simplified to just use the existingobsidian_enabledflag.
| from advanced_omi_backend.models.annotation import Annotation, TranscriptAnnotation | ||
| from advanced_omi_backend.models.audio_chunk import AudioChunkDocument | ||
| from advanced_omi_backend.models.conversation import Conversation | ||
| from advanced_omi_backend.models.user import User | ||
| from advanced_omi_backend.models.waveform import WaveformData | ||
|
|
||
| await init_beanie( | ||
| database=config.db, | ||
| document_models=[User, Conversation, AudioChunkDocument, WaveformData, Annotation], | ||
| document_models=[User, Conversation, AudioChunkDocument, WaveformData, Annotation, TranscriptAnnotation], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n 'class Annotation' backends/advanced/src/advanced_omi_backend/models/annotation.pyRepository: SimpleOpenSoftware/chronicle
Length of output: 155
🏁 Script executed:
rg -n '^\s*class ' backends/advanced/src/advanced_omi_backend/models/annotation.pyRepository: SimpleOpenSoftware/chronicle
Length of output: 218
🏁 Script executed:
cat -n backends/advanced/src/advanced_omi_backend/models/__init__.py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 526
Remove the non-existent Annotation class from imports and document models.
Line 111 imports Annotation, which does not exist in models/annotation.py — only TranscriptAnnotation is defined there. This will cause an ImportError at application startup. Remove Annotation from both the import statement and the document_models list on line 119.
🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/app_factory.py` around lines 111 -
119, The import and Beanie model list incorrectly reference a non-existent
Annotation class; remove Annotation from the import statement that currently
reads "from advanced_omi_backend.models.annotation import Annotation,
TranscriptAnnotation" and also remove Annotation from the document_models list
passed to init_beanie (which currently includes User, Conversation,
AudioChunkDocument, WaveformData, Annotation, TranscriptAnnotation), leaving
only the actual model classes (e.g., TranscriptAnnotation, AudioChunkDocument,
WaveformData, Conversation, User) so the app no longer raises ImportError on
startup.
| MODEL_SUGGESTION = "model_suggestion" | ||
|
|
||
| # Identity | ||
| id: str = Field(default_factory=lambda: str(uuid.uuid4())) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the structure and check the file in question
find . -type f -name "annotation.py" | head -5Repository: SimpleOpenSoftware/chronicle
Length of output: 137
🏁 Script executed:
# Check the annotation.py file to see the context
cat -n backends/advanced/src/advanced_omi_backend/models/annotation.py | head -40Repository: SimpleOpenSoftware/chronicle
Length of output: 1568
🏁 Script executed:
# Check if other Beanie Document models also override id with str
rg -n 'id:\s*str\s*=\s*Field' --type=py -C2 | head -100Repository: SimpleOpenSoftware/chronicle
Length of output: 4940
🏁 Script executed:
# Look for Beanie Document imports and usage patterns in the codebase
rg -n 'from beanie import Document|class.*\(Document\)' --type=py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 995
🏁 Script executed:
# Check for Beanie operations like get(), find_one() that might interact with id
rg -n '\.get\(|\.find_one\(|by_id' --type=py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 4713
🏁 Script executed:
# Check audio_chunk.py
cat -n backends/advanced/src/advanced_omi_backend/models/audio_chunk.py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 1980
🏁 Script executed:
# Check conversation.py
cat -n backends/advanced/src/advanced_omi_backend/models/conversation.py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 2376
🏁 Script executed:
# Check waveform.py
cat -n backends/advanced/src/advanced_omi_backend/models/waveform.py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 1766
🏁 Script executed:
# Check user.py to see PydanticObjectId usage
cat -n backends/advanced/src/advanced_omi_backend/models/user.py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 2144
🏁 Script executed:
# Search for TranscriptAnnotation usage - specifically looking for .get(), .find_one(), or other Beanie operations
rg -n 'TranscriptAnnotation' --type=py -A3 | head -100Repository: SimpleOpenSoftware/chronicle
Length of output: 9434
🏁 Script executed:
# Check if knowledge_graph models are Beanie Documents
cat -n backends/advanced/src/advanced_omi_backend/services/knowledge_graph/models.py | head -80Repository: SimpleOpenSoftware/chronicle
Length of output: 3108
🏁 Script executed:
# Check for any other Beanie Document models with explicit id override
rg -B5 'class\s+\w+\(Document\)' backends/advanced/src/advanced_omi_backend/models/ --type=py | grep -A8 'class.*Document'Repository: SimpleOpenSoftware/chronicle
Length of output: 1823
🏁 Script executed:
# Check if conversation.py has an explicit id field
rg -n 'id.*=.*Field|class Settings' backends/advanced/src/advanced_omi_backend/models/conversation.py | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 1242
🏁 Script executed:
# Verify the exact pattern in User model to confirm PydanticObjectId is the standard
cat -n backends/advanced/src/advanced_omi_backend/models/user.py | grep -A15 'class User'Repository: SimpleOpenSoftware/chronicle
Length of output: 2102
🏁 Script executed:
# Check if any documentation or configuration mentions this pattern choice
rg -i 'pydanticobjid|str.*uuid|id.*field' backends/advanced/src/advanced_omi_backend/models/ --type=py | head -30Repository: SimpleOpenSoftware/chronicle
Length of output: 2313
Align TranscriptAnnotation.id with Beanie conventions — use PydanticObjectId instead of str.
TranscriptAnnotation is the only Beanie Document in the codebase overriding id with a plain str. All other Beanie models (Conversation, AudioChunkDocument, WaveformData) either don't override it or follow the standard pattern. The User model correctly extends BeanieBaseUser which uses PydanticObjectId. While the current str-based approach works for basic queries, it deviates from Beanie conventions and may cause subtle issues if the codebase later relies on _id being properly mapped. Switch to PydanticObjectId for consistency, or use a separate semantic field (like Conversation.conversation_id) if you need a UUID string.
🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/models/annotation.py` at line 20,
TranscriptAnnotation currently overrides id as a plain str (id: str =
Field(default_factory=lambda: str(uuid.uuid4()))) which diverges from Beanie
conventions; update TranscriptAnnotation.id to use Beanie's PydanticObjectId
type and default to an ObjectId (import PydanticObjectId from beanie and change
the field declaration on the TranscriptAnnotation class), or alternatively
remove the custom id so Beanie provides the default _id, and if you need a UUID
string keep it as a separate field like transcript_uuid; update imports to
include PydanticObjectId and adjust any code that assumes id is a str to use its
string representation when needed.
backends/advanced/src/advanced_omi_backend/routers/api_router.py
Outdated
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
Outdated
Show resolved
Hide resolved
| @pytest.mark.asyncio | ||
| async def test_start_method_runs_worker(self): | ||
| """Test that start() class method creates instance and runs it.""" | ||
| with patch.object(ParakeetStreamWorker, "run", new_callable=AsyncMock) as mock_run: | ||
| with patch("asyncio.run") as mock_asyncio_run: | ||
| # Simulate script execution | ||
| mock_asyncio_run.side_effect = lambda coro: asyncio.new_event_loop().run_until_complete( | ||
| coro | ||
| ) | ||
|
|
||
| worker_instance = ParakeetStreamWorker() | ||
| await worker_instance.run() | ||
|
|
||
| mock_run.assert_called_once() |
There was a problem hiding this comment.
Same issue as the Deepgram counterpart — start() is never called.
Same problem as TestDeepgramStreamWorker.test_start_method_runs_worker: the test directly awaits run() instead of calling the start() classmethod.
🤖 Prompt for AI Agents
In `@backends/advanced/tests/unit/workers/test_audio_stream_workers.py` around
lines 240 - 253, The test is calling instance.run() instead of testing the
classmethod start(); update
TestParakeetStreamWorker.test_start_method_runs_worker to invoke
ParakeetStreamWorker.start() (or await ParakeetStreamWorker.start() if async)
rather than creating an instance and awaiting run(), keep the existing patches
for ParakeetStreamWorker.run (AsyncMock) and asyncio.run so the classmethod
triggers the patched run and then assert mock_run.assert_called_once(); this
replaces the worker_instance = ParakeetStreamWorker(); await
worker_instance.run() sequence with a call to the start() entrypoint.
| console.print(f"[yellow]Warning: Could not read config.yml: {e}[/yellow]") | ||
|
|
||
| # Final Summary | ||
| console.print(f"\n🎊 [bold green]Setup Complete![/bold green]") |
There was a problem hiding this comment.
Remove extraneous f prefixes on strings without placeholders.
Static analysis (Ruff F541) correctly flags lines 551 and 560.
Proposed fix
- console.print(f"\n🎊 [bold green]Setup Complete![/bold green]")
+ console.print("\n🎊 [bold green]Setup Complete![/bold green]")- console.print(f"\n📚 [bold cyan]Obsidian Integration Detected[/bold cyan]")
+ console.print("\n📚 [bold cyan]Obsidian Integration Detected[/bold cyan]")Also applies to: 560-560
🧰 Tools
🪛 Ruff (0.14.14)
[error] 551-551: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@wizard.py` at line 551, Remove the unnecessary f-string prefixes on the
console.print calls that don't use any interpolations; specifically update the
console.print invocation(s) in wizard.py (e.g., the call that prints "🎊 [bold
green]Setup Complete![/bold green]" and the other similar console.print at line
~560) to use plain string literals instead of f-strings (remove the leading 'f'
from those string literals).
…nsumer - Replaced specific Deepgram and Parakeet consumer implementations with a common StreamingTranscriptionConsumer in both DeepgramStreamWorker and ParakeetStreamWorker. - Updated unit tests to reflect the changes in consumer instantiation, ensuring they validate the new consumer type. - Enhanced code maintainability by consolidating consumer logic into a single class.
- Implemented methods to retrieve model definitions and infer embedding dimensions for OpenAI-compatible models. - Enhanced the `_upsert_openai_models` method to update or create model entries in the configuration, ensuring defaults are set correctly. - Updated the setup process to prompt users for custom API settings, including base URL and model names, improving flexibility in configuration. - Added unit tests to validate the OpenAI setup flow, ensuring proper handling of custom API initialization and configuration updates.
- Created a GitHub Actions workflow for running unit tests on the advanced backend, triggered by pull requests and pushes to specific branches. - Added a new `test-unit` target in the Makefile to execute Python unit tests for the advanced backend, ensuring proper test execution and reporting. - Enhanced the Makefile to include the new test target, improving the testing process for backend development.
…testing - Modified the GitHub Actions workflow to trigger unit tests using the new `Makefile.unittests` specifically for the advanced backend. - Created a new `Makefile.unittests` to define the `test-unit` target for running Python unit tests, enhancing the testing process and organization.
…n test execution - Modified the `test-unit` target in the `Makefile.unittests` to set the `AUTH_SECRET_KEY` environment variable during the execution of Python unit tests, ensuring proper authentication handling for tests.
- Introduced a new `Makefile.unittests` to streamline unit and Robot test execution for the advanced backend. - Updated the README.md to include instructions for running unit and Robot tests. - Modified GitHub Actions workflow to reference the new Makefile, ensuring proper test execution on relevant changes. - Removed the old `Makefile.unittests` from the backends/advanced directory to consolidate test management.
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores